169 research outputs found

    Multi-Character Field Recognition for Arabic and Chinese Handwriting

    Get PDF
    Two methods, Symbolic Indirect Correlation (SIC) and Style Constrained Classification (SCC), are proposed for recognizing handwritten Arabic and Chinese words and phrases. SIC reassembles variable-length segments of an unknown query that match similar segments of labeled reference words. Recognition is based on the correspondence between the order of the feature vectors and of the lexical transcript in both the query and the references. SIC implicitly incorporates language context in the form of letter n-grams. SCC is based on the notion that the style (distortion or noise) of a character is a good predictor of the distortions arising in other characters, even of a different class, from the same source. It is adaptive in the sense that with a long-enough field, its accuracy converges to that of a style-specific classifier trained on the writer of the unknown query. Neither SIC nor SCC requires the query words to appear among the references

    An Open Architecture for End-to-End Document Analysis Benchmarking

    Get PDF
    ISBN: 978-1-4577-1350-7International audienceIn this paper we present a fully operational, scalable and open architecture allowing to perform end-to-end document analysis benchmarking without needing to develop the whole pipeline. By decomposing the whole analysis process into coarse grained tasks, and by building upon community provided state-of-the art algorithms, our architecture allows virtually any combination of elementary document analysis algorithms, regardless their running system environment, programming language or data structures. Its flexible structure makes it very straightforward to plug in new experimental algorithms, compare them to equivalent other algorithms, and observe its effects on end-to-end tasks without need to install, compile or otherwise interact with any other software than one's own

    Multi-Character Field Recognition for Arabic and Chinese Handwriting

    Get PDF
    Two methods, Symbolic Indirect Correlation (SIC) and Style Constrained Classification (SCC), are proposed for recognizing handwritten Arabic and Chinese words and phrases. SIC reassembles variable-length segments of an unknown query that match similar segments of labeled reference words. Recognition is based on the correspondence between the order of the feature vectors and of the lexical transcript in both the query and the references. SIC implicitly incorporates language context in the form of letter n-grams. SCC is based on the notion that the style (distortion or noise) of a character is a good predictor of the distortions arising in other characters, even of a different class, from the same source. It is adaptive in the sense that with a long-enough field, its accuracy converges to that of a style-specific classifier trained on the writer of the unknown query. Neither SIC nor SCC requires the query words to appear among the references

    A Platform for Storing, Visualizing, and Interpreting Collections of Noisy Documents

    Get PDF
    International audienceThe goal of document image analysis is to produce interpretations that match those of a fluent and knowledgeable human when viewing the same input. Because computer vision techniques are not perfect, the text that results when processing scanned pages is frequently noisy. Building on previous work, we propose a new paradigm for handling the inevitable incomplete, partial, erroneous, or slightly orthogonal interpretations that commonly arise in document datasets. Starting from the observation that interpretations are dependent on application context or user viewpoint, we describe a platform now under development that is capable of managing multiple interpretations for a document and offers an unprecedented level of interaction so that users can freely build upon, extend, or correct existing interpretations. In this way, the system supports the creation of a continuously expanding and improving document analysis repository which can be used to support research in the field

    Towards Improved Paper-Based Election Technology

    Get PDF
    Resources are presented for fostering paper-based election technology. They comprise a diverse collection of real and simulated ballot and survey images, and software tools for ballot synthesis, registration, segmentation, and ground-truthing. The grids underlying the designated location of voter marks are extracted from 13,315 degraded ballot images. The actual skew angles of sample ballots, recorded as part of complete ballot descriptions compiled with the interactive ground-truthing tool, are compared with their automatically extracted parameters. The average error is 0.1 degrees. These results provide a baseline for the application of digital image analysis to the scrutiny of electoral ballots

    The DAE Platform: a Framework for Reproducible Research in Document Image Analysis

    Get PDF
    International audienceWe present the DAE Platform in the specic context of reproducible research. DAE was developed at Lehigh University targeted at the Document Image Analysis research community for distributing document images and associated document analysis algorithms, as well as an unlimited range of annotations and ground truth for benchmark-ing and evaluation of new contributions to the state-of-the-art. DAE was conceived from the beginning with the idea of reproducibility and data provenance in mind. In this paper we more specically analyze how this approach answers a number of challenges raised by the need of providing fully reproducible experimental research. Furthermore, since DAE has been up and running without interruption since 2010, we are in a position of providing a qualitative analysis of the technological choices made at the time, and suggest some new perspectives in light of more recent technologies and practices

    Characterizing Challenged Minnesota Ballots

    Get PDF
    Photocopies of the ballots challenged in the 2008 Minnesota elections, which constitute a public record, were scanned on a high-speed scanner and made available on a public radio website. The PDF files were downloaded, converted to TIF images, and posted on the PERFECT website. Based on a review of relevant image-processing aspects of paper-based election machinery and on additional statistics and observations on the posted sample data, robust tools were developed for determining the underlying grid of the targets on these ballots regardless of skew, clipping, and other degradations caused by high-speed copying and digitization. The accuracy and robustness of a method based on both index-marks and oval targets are demonstrated on 13,435 challenged ballot page images

    Evaluation of Voting with Form Dropout Techniques for Ballot Vote Counting

    Get PDF
    Vote counting accuracy has become a well-known issue in the vote collection process. Digital image processing techniques can be incorporated in the analysis of printed election ballots. Current image processing techniques in the vote collection process are heavily dependent on the anticipated, geometric positioning of the vote. These techniques don’t account for markings made outside of the requested field of input. Using various form dropout techniques, however, every mark on the form can be extracted and used by the machine to make an intelligent decision. Most methods will still miss a few marks and result in a few false alarms. This paper explores methods of voting between the results of the different mark extraction methods to improve recognition. To provide diversity a simple image subtraction technique is paired with a distance transform and a morphology based algorithm. The result has a higher detection rate and a lower false alarm rate
    corecore